Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
نویسندگان
چکیده
Semantically complex queries which include attributes of objects and relations between objects still pose a major challenge to image retrieval systems. Recent work in computer vision has shown that a graph-based semantic representation called a scene graph is an effective representation for very detailed image descriptions and for complex queries for retrieval. In this paper, we show that scene graphs can be effectively created automatically from a natural language scene description. We present a rule-based and a classifierbased scene graph parser whose output can be used for image retrieval. We show that including relations and attributes in the query graph outperforms a model that only considers objects and that using the output of our parsers is almost as effective as using human-constructed scene graphs (Recall@10 of 27.1% vs. 33.4%). Additionally, we demonstrate the general usefulness of parsing to scene graphs by showing that the output can also be used to generate 3D scenes.
منابع مشابه
Scene Graph Parsing as Dependency Parsing
In this paper, we study the problem of parsing structured knowledge graphs from textual descriptions. In particular, we consider the scene graph representation (Johnson et al., 2015) that considers objects together with their attributes and relations: this representation has been proved useful across a variety of vision and language applications. We begin by introducing an alternative but equiv...
متن کاملOntology-Based Image Retrieval
The binary form of an image does not tell what the image is about. It is possible to retrieve images from a database using pattern matching techniques, but usually textual descriptions attached to the images are used. Semantic web ontology and metadata languages provide a new way to annotating and retrieving images. This paper considers the situation when a user is faced with an image repositor...
متن کاملEvaluating a text-to-scene generation system as an aid to literacy
We discuss classroom experiments using WordsEye, a system for automatically generating 3D scenes from English textual descriptions. Input is syntactically and semantically processed to identify a set of graphical objects and constraints which are then rendered as a 3D scene. We describe experiments with the system in a summer literacy enrichment program conducted at the Harlem Educational Activ...
متن کاملGraph Grammar Based Object Recognition for Image Retrieval
| In order to retrieve a set of intended images from an image archive, human beings think of special contents with respect to the searched scene. The necessity of a semantics-based retrieval leads to a content-based analysis and retrieval of images. From this point of view, our project Image Retrieval for Information Systems (IRIS) develops and combines methods and techniques of computer vision...
متن کاملImage Generation from Scene Graphs
To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015